Search CORE

90 research outputs found

Experiences with the ISOcat Data Category Registry

Author: Broeder D.
Schuurman I.
Windhouwer M.
Publication venue
Publication date: 01/01/2014
Field of study

RELcat: a Relation Registry for ISOcat data categories

Author: Windhouwer M.
Publication venue
Publication date: 01/01/2012
Field of study

The ISOcat Data Category Registry contains basically a flat and easily extensible list of data category specifications. To foster reuse and standardization only very shallow relationships among data categories are stored in the registry. However, to assist crosswalks, possibly based on personal views, between various (application) domains and to overcome possible proliferation of data categories more types of ontological relationships need to be specified. RELcat is a first prototype of a Relation Registry, which allows storing arbitrary relationships. These relationships can reflect the personal view of one linguist or a larger community. The basis of the registry is a relation type taxonomy that can easily be extended. This allows on one hand to load existing sets of relations specified in, for example, an OWL (2) ontology or SKOS taxonomy. And on the other hand allows algorithms that query the registry to traverse the stored semantic network to remain ignorant of the original source vocabulary. This paper describes first experiences with RELcat and explains some initial design decisions

MPG.PuRe

Towards standardized descriptions of linguistic features: ISOcat and procedures for using common data categories

Author: Windhouwer M.
Publication venue
Publication date: 01/01/2012
Field of study

Automatic Language Identification of written texts is a well-established area of research in Computational Linguistics. State-of-the-art algorithms often rely on n-gram character models to identify the correct language of texts, with good results seen for European languages. In this paper we propose the use of a character n-gram model and a word n-gram language model for the automatic classification of two written varieties of Portuguese: European and Brazilian. Results reached 0.998 for accuracy using character 4-grams

MPG.PuRe

Linking to linguistic data categories in ISOcat

Author: Windhouwer M.
Wright S.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

ISO Technical Committee 37, Terminology and other language and content resources, established an ISO 12620:2009 based Data Category Registry (DCR), called ISOcat (see http://www.isocat.org), to foster semantic interoperability of linguistic resources. However, this goal can only be met if the data categories are reused by a wide variety of linguistic resource types. A resource indicates its usage of data categories by linking to them. The small DC Reference XML vocabulary is used to embed links to data categories in XML documents. The link is established by an URI, which servers as the Persistent IDentifier (PID) of a data category. This paper discusses the efforts to mimic the same approach for RDF-based resources. It also introduces the RDF quad store based Relation Registry RELcat, which enables ontological relationships between data categories not supported by ISOcat and thus adds an extra level of linguistic knowledge

CiteSeerX

Crossref

MPG.PuRe

FLAT: A CLARIN-compatible repository solution based on Fedora Commons

Author: Trilsbeek P.
Windhouwer M.
Publication venue
Publication date: 21/11/2016
Field of study

This paper describes the development of a CLARIN-compatible repository solution that fulfils both the long-term preservation requirements as well as the current day discoverability and usability needs of an online data repository of language resources. The widely used Fedora Commons open source repository framework, combined with the Islandora discovery layer, forms the basis of the solution. On top of this existing solution, additional modules and tools are developed to make it suitable for the types of data and metadata that are used by the participating partners

MPG.PuRe

RELISH LMF: Unlocking the full power of the lexical markup framework

Author: Petro J.
Shayan S.
Windhouwer M.
Publication venue
Publication date: 01/01/2014
Field of study

MPG.PuRe

Knowledge management for small languages

Author: Verweij H.
Windhouwer M.
Wittenburg P.
Publication venue
Publication date: 01/01/2011
Field of study

In this paper an overview of the knowledge components needed for extensive documentation of small languages is given. The Language Archive is striving to offer all these tools to the linguistic community. The major tools in relation to the knowledge components are described. Followed by a discussion on what is currently lacking and possible strategies to move forward

MPG.PuRe

ISOcat: Remodeling metadata for language resources

Author: Kemps-Snijders M.
Windhouwer M.
Wittenburg P.
Wright S.
Publication venue: 'Inderscience Publishers'
Publication date: 01/01/2009
Field of study

The Max Planck Institute for Psycholinguistics in Nijmegen, The Netherlands, is creating a state-of-the-art web environment for the ISO TC 37 (terminology and other language and content resources) metadata registry. This Data Category Registry (DCR) is called ISOcat and encompasses data categories for a broad range of language resources. Under the governance of the DCR Board, ISOcat provides an open work space for creating data category specifications, defining Data Category Selections (DCSs) (domain-specific groups of data categories), and standardising selected data categories and DCSs. Designers visualise future interactivity among the DCR, reference registries and ontological knowledge space

MPG.PuRe

Content-based video indexing for the support of digital library search

Author: Agrawal Rakesh
Apers Peter M.G.
Blok H.E.
Jonker Willem
Kersten M.
Petkovic M.
van Zwol Roelof
Windhouwer M.
Publication venue: IEEE Computer society Press
Publication date: 01/01/2002
Field of study

Presents a digital library search engine that combines efforts of the AMIS and DMW research projects, each covering significant parts of the problem of finding the required information in an enormous mass of data. The most important contributions of our work are the following: (1) We demonstrate a flexible solution for the extraction and querying of meta-data from multimedia documents in general. (2) Scalability and efficiency support are illustrated for full-text indexing and retrieval. (3) We show how, for a more limited domain, like an intranet, conceptual modelling can offer additional and more powerful query facilities. (4) In the limited domain case, we demonstrate how domain knowledge can be used to interpret low-level features into semantic content. In this short description, we focus on the first and fourth item

CWI's Institutional Repository

Pure OAI Repository

University of Twente Research Information

International Migration, Integration and Social Cohesion online publications

Ensuring semantic interoperability on lexical resources

Author: Kemps-Snijders M.
Ringersma J.
Windhouwer M.
Zinn C.
Publication venue
Publication date: 01/01/2008
Field of study

In this paper, we describe a unifying approach to tackle data heterogeneity issues for lexica and related resources. We present LEXUS, our software that implements the Lexical Markup Framework (LMF) to uniformly describe and manage lexica of different structures. LEXUS also makes use of a central Data Category Registry (DCR) to address terminological issues with regard to linguistic concepts as well as the handling of working and object languages. Finally, we report on ViCoS, a LEXUS extension, providing support for the definition of arbitrary semantic relations between lexical entries or parts thereof

MPG.PuRe